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Im plementing an untrusted operatin g systenn on trusted hardware | 
David Lie, Chandramohan A. Thekkath, Mark Horowitz 
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ACM symposium on Operating systems principles SOSP '03, volume 37 issue 
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Recently, there has been considerable interest in providing "trusted computing platforms" 
using hardware^— ~TCPA and Palladium being the most publicly visible examples. In this 
paper we discuss our experience with building such a platform using a traditional time- 
sharing operating system executing on XOM~— '^a processor architecture that provides 
copy protection and tamper-resistance functions. In XOM, only the processor is trusted; 
main memory and the operating system are not trusted. Our opera ... 



Keywords: XOM, XOMOS, untrusted operating systems 



An evaluation of memory consistency models for shared-memory systems with ILP 
processors 

Vljay S. Pal, Parthasarathy Ranganathan, Sarita V. Adve, Tracy Harton 

September 1996 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , 

Proceedings of the seventh international conference on Architectural 
support for programming languages and operating systems ASPLOS- 

VII, Volume 31 , 30 Issue 9,5 
Publisher: ACM Press 

Full text available- 113 pdf(1 64 MB) Additional Information: full citation , abstract , refer ences , citings, index 

terms 

Relaxed consistency models have been shown to significantly outperform sequential 
consistency for single-issue, statically scheduled processors with blocking reads. However, 
current microprocessors aggressively exploit instruction-level parallelism (ILP) using 
methods such as multiple issue, dynamic scheduling, and non-blocking reads. 
Researchers have conjectured that two techniques, hardware-controlled non-binding 
prefetching and speculative loads, have the potential to equalize the hardware pe ... 

The STAMPede a p proach to thread-level speculation 

J. Gregory Steffan, Christopher Colohan, Antonia Zhai, Todd C. Mowry 
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August 2005 ACM Transactions on Computer Systems (TOCS), volume 23 issue 3 
Publisher: ACM Press 

F II text a ailable 'PI Ddff 1 72 MB) Additional Information: f ull cit ation, a bstrac t, references, citings, index 
u V -TaIP— U tenns 

Multithreaded processor architectures are becoming increasingly commonplace: many 
current and upcoming designs support chip multiprocessing, simultaneous multithreading, 
or both. While it is relatively straightforward to use these architectures to improve the 
throughput of a multithreaded or multiprogrammed workload, the real challenge is how to 
easily create parallel software to allow single programs to effectively exploit all of this raw 
performance potential. One promising technique fo ... 

Keywords: Thread-level speculation, automatic parallelization, cache coherence, chip- 
multiprocessing 



A scalable a p proach to thread-level speculation 
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, Todd C. Mowry 
May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th 

annual international symposium on Computer architecture ISCA '00, volume 

28 Issue 2 . 
Publisher: ACM Press 

Full text available- "PI pdf(1 86 97 KB) Information: full citation , abstract, references , citings, index 

|A| terms 

While architects understand how to build cost-effective parallel machines across a wide 
spectrum of machine sizes (ranging from within a single chip to large-scale servers), the 
real challenge is how to easily create parallel software to effectively exploit all of this raw 
performance potential. One promising technique for overcoming this problem is Thread- 
Level Speculation (TLS), which enables the compiler to optimistically create parallel 
threads ... 

Evaluating the performance of four snooping cache coherency protocols 
S. J. Eggers, R. H. Katz 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th 

annual international symposium on Computer architecture ISCA '89, volume 
17 Issue 3 
Publisher: ACM Press 

Full text available- ^ DdfH 70 MB) Additional Information: full citation , abstract , references , citin gs, index 
'V^'^—^ ' terms 

Write-invalldate and write-broadcast coherency protocols have been criticized for being 
unable to achieve good bus performance across all cache configurations. In particular, 
write-invalidate performance can suffer as block size increases; and large cache sizes will 
hurt write-broadcast. Read-broadcast and competitive snooping extensions to the 
protocols have been proposed to solve each problem. Our results indicate that the 
benefits of the extensions are limited. Read-broadcast ... 

6 S peculative synchronization: ap plying thread-level speculation to explicitly parallel 
^ ap plications 

^ Jose F. Martinez, Josep Torrellas 

October 2002 ACM SIGOPS Operating Systems Review , ACM SIGARCH Computer 
Architecture News , ACM SIGPLAN Notices , Proceedings of the 10th 
international conference on Architectural support for programming 
languagies and operating systems ASPLOS-X, volume 36 , 30 , 37 issue 5 , 5 , lo 

Publisher: ACM Press 

Full text available: 'p!| pdf(1.49 MB) Additional Information: full citation > abstract , references , citin gs 
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Barriers, locks, and flags are synchronizing operations widely used programmers and 
parallelizing compilers to produce race-free parallel programs. Often times, these 
operations are placed suboptimally, either because of conservative assumptions about the 
program, or merely for code simplicity. We propose Speculative Synchronization, which 
applies the philosophy behind Thread-Level Speculation (TLS) to explicitly parallel • 
applications. Speculative threads execute past active barriers, busy ... 

Memo r y coherence in shared virtual memon / s ystems 
Kai Li, Paul Hudak 

November 1989 ACM Transactions on Computer Systems (TOCS), volume 7 issue 4 
Publisher: ACM Press 

Full text available* "S!! pdf (2 71 MB) Additional Information: full citation , abstract , references , citings, index 
' ' terms , review 

The memory coherence problem in designing and implementing a shared virtual memory 
on loosely coupled multiprocessors is studied in depth. Two classes of algorithms, 
centralized and distributed, for solving the problem are presented. A prototype shared 
virtual memory on an Apollo ring based on these algorithms has been implemented. Both 
theoretical and practical results show that the memory coherence problem can indeed be 
solved efficiently on a loosely coupled multiprocessor. 

Performance analysis of multiprocessor cache consistency protocols usin g 
generalized timed Petri nets 
Mary K. Vernon, Mark A. Holliday 

May 1986 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1986 ACM SIGMETRICS joint international conference on Computer 
performance modelling, measurement and evaluation SIGMETRICS 
'86/ PERFORMANCE '86, Volume 14 Issue 1 
Publisher: ACM Press 

Full text available* "PI Ddfd 1 5 MB) ■ Additional Information: full citation , abstract , references , citings , index 

We use an exact analytical technique, based on Generalized Timed Petri Nets (GTPNis), to 
study the performance of shared bus cache consistency protocols for multiprocessors. We 
develop a general framework within which the key characteristics of the Write-Once 
protocol and four enhancements that have been combined in various ways in the 
literature can be identified and. evaluated. We then quantitatively assess the performance 
gains for each of the four enhancements. We conside ... 

Toleratin g latency in multiprocessors through compiler-inserted prefetching 
Todd C. Mowry 

February 1998 ACM Transactions on Computer Systems (TOCS), volume i6 issue i 
Publisher: ACM Press 

Full text available- "PI Ddf(41 0 70 KB) Additional Information: full citation , abstract , references , citings , index 
. terms , review 

The large latency of memory accesses in large-scale shared-memory multiprocessors is a 
key obstacle to achieving high processor utilization. Software-controlled prefetching is a 
technique for tolerating memory latency by explicitly executing instructions to move data 
close to the processor before the data are actually needed. To minimize the burden on the 
programmer, compiler support is needed to automatically insert prefetch instructions into 
the code. A key challenge when ... 

Keywords: compiler optimization, prefetching 
Store Memory-Level Parallelism Optinnizations for Commercial A p plications 
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Yuan Chou, Lawrence Spracklen, Santosh G. Abraham 

November 2005 Proceedings of the 38th annual IEEE/ ACM International Symposium 
on Microarchitecture MICRO 38 

Publisher: IEEE Computer Society 

Full text available: pdf(413.83 KB) 

=1 ' Additional Infomiation: full citation , abstract , index terms 

W Publisher Site 

This paper studies the impact of off-chip store misses on processor performance for 
modern commercial applications. The performance impact of off-chip store misses is 
largely determined by the extent of their overlap with other off-chip cache misses. The 
epoch MLR model is used to explain and quantify how these overlaps are affected by 
various store handling optimizations and by the memory consistency model Implemented 
by the processor. The extent of these overlaps are then translated to off-chi ... 

11 A sing le cached co p y data coherence scheme for nriultiprocessor systems 
^ A. Mendelson, D. K. Pradhan, A. D. Singh 

>^ December 1989 ACM SIGARCH Computer Architecture News, volume 17 issue 6 
Publisher: ACM Press 

Full text available: 'g[ pdf( 667.24 KB ) Additional Information: full citation , abstract , index terms 

We present and evaluate a snoopy cache memory protocol, the Single Cache Copy Data 
Coherence (SCCDC), for multiprocessors that allows only a single cache to hold a given 
share-d data at any time. The simulations presented here indicate that despite its 
simplicity, the scheme has the potential for good performance comparable with more 
complex snoopy cache schemes. We have also shown in related work [8] that the 
existence of only a single copy of data in cache allows efficient access control to sh ... 

12 Real-time shadin g . 

^ Marc Olano, Kurt Akeley, John C. Hart, Wolfgang Heidrlch, Michael McCool, Jason L. Mitchell, 
^ Randi Rost 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 

Publisher: ACM Press 

Full text available: 'g| pdf(7.39 MB) Additional Infomiation: full citation , abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
redesigned to address today's real-time shading capabili ... 

Session 6: threads: Thread-Level Speculation on a CMP can be ene rg y efficie nt 
Jose Renau, Karin Strauss, Luis Ceze, Wei Liu, Smruti Sarangi, James Tuck, Josep Torrellas 
June 2005 Proceedings of the 19th annual international conference on 

Supercomputing ICS '05 
Publisher: ACM Press 

Full text available: 'g| pdf(370.24 KB ) Additional information: full citation , abstract , references 

Chip Multiprocessors (CMP) with Thread-Level Speculation (TLB) have become the subject 
of intense research. However, TLS is suspected of being too energy inefficient to compete 
against conventional processors. In this paper, we refute this claim. To do so, we first 
identify the main sources of dynamic energy consumption in TLS. Then, we present 
simple energy-saving optimizations that cut the energy cost of TLS by over 60% on 
average with minimal performance impact. The resulting TLS CMP, populat ... 
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memory and communication performance 
Gheith A. Abandah, Edward S. Davidson 
April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th 

annual international symposium on Computer architecture ISCA '98, volume 
26 Issue 3 

Publisher: IEEE Computer Society, ACM Press 

Full text available: c^ p^f(^ 42 mb) Additional Infonnation: full citation , abstract , references , citing s, index 
Publisher Site 

Advances in microarchitecture, packaging, and manufacturing processes enable designers 
to build new systems with higher performance and scalability. Using microbenchmark 
techniques, we contrast the memory and communication performance of two generations 
of the HP/Convex Exemplar scalable parallel processing system. The SPPIOOO and 
SPP2000 have significant architectural and implementation differences, but maintain 
upward binary compatibility. The SPP2000 employs manufacturing and packaging 
advanc ... 

15 Tradeoffs in bufferin g s peculative memory state for tliread-level speculation in | 
^ multiprocessors 

^ Maria Jesus Garzaran, Milos Prvulovic, Jose Mana Llaberia, Victor Vifials, Lawrence 
Rauchwerger, Josep Torrellias 

September 2005 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 2 Issue 3 
Publisher: ACM Press 

Full text available: 'g| pdf(798.26 KB) Additional Information: full citation , abstract , references , index terms 

Thread-Level Speculation (TLS) provides architectural support to aggressively run hard- 
to-analyze code In parallel. As speculative tasks run concurrently, they generate unsafe or 
speculative memory state that needs to be separately buffered and managed in the 
presence of distributed caches and buffers. Such a state may contain multiple versions of 
the same variable. In this paper, we introduce a novel taxonomy of approaches to buffer 
and manage multiversion speculative memory state In mul ... 

Keywords: Caching and buffering support, coherence protocol, memory hierarchies, 
shared-memory multiprocessors, thread-level speculation 



16 Cherry-MP: Correctly Inte g rating Checkpointed Early Resource Recycling in Chi p 
Multiprocessors 

Meyrem Kyrman, Nevin Kyrman, Jose F. Martynez 

November 2005 Proceedings of the 38th annual IEEE/ACM International Symposium 
on Microarchitecture MICRO 38 

Publisher: IEEE Computer Society 

Full text available: ^ pdf( 453.38 KB ) 

s Additional Information: full citation , abstract , index terms 

^ Publislier Site 

Checkpointed Early Resource Recycling (Cherry) is a recently-proposed micro- 
architectural technique that aims at improving critical resource utilization by performing 
aggressive resource recyding decoupled from instruction retirement, using a 
checkpoint/rollback mechanismto recover from occasional incorrect execution. In this 
paper, we explore correctness and perfornnance issues that arise whein Cherryenabled 
processors are used in chip multiprocessor architectures. We propose mechanisms to 
addre ... 
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Milos Prvulovic, Josep Torrellas 

May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture ISCA '03, volume 

31 Issue 2 

Publisher: ACM Press 

Full text available: '^.qMX^.S^JS^ Additional Information: full citation , abstract , references , citings 

While removing software bugs consumes vast amounts of human time, hardware support 
for debugging in modern computers remains rudimentary. Fortunately, we show that 
nriechanisms for Thread-Level Speculation (TLS) can be reused to boost debugging 
productivity. Most notably, TLS's rollback capabilities can be extended to support rolling 
back recent buggy execution and repeating it as many times as necessary until the bug is 
fully characterized. These Incremental re-executions are deterministic even i ... 



18 A characterization of sharing in parallel programs and its application to coherency 
protocol evaluation 
S. J. Eggers, R. H. Katz 

May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th 
Annual International Symposium on Computer architecture ISCA '88, 

Volume 16 Issue 2 
Publisher: IEEE Computer Society Press, ACM Press 
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In this paper we use trace-driven simulation to analyze the memory reference patterns of 
write shared data in several parallel applications. We first develop a characterization of 
write sharing (based on the notion of a write run), and then examine the traces, using 
metrics derived from the characterization. The results indicate that the amount of write 
sharing in all programs is small; and that it is characterized by short to medium 
sequences of per processor references, with little conten ... 

19 An ada ptive cache coherence protocol optimized for mi gra tory sharin g | 
Per Stenstrom, Mats Brorsson, Lars Sandberg 

May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th 

annual international symposium on Computer architecture ISCA '93, volume 

21 Issue 2 
Publisher: ACM Press 

Full text available- "Fl odfd 1 6 MB) Additional Infonmation: full citation , abstract , references , citjngs, index . ' 
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Parallel programs that use critical sections and are executed on a shared-memory 
multiprocessor with a write-invalidate protocol result in invalidation actions that could be 
eliminated. For this type of sharing, called migratory sharing, each processor typically 
causes a cache miss followed by an invalidation request which could be merged with the 
preceding cache-miss request. In this paper we propose an adaptive protocol that invokes 
this optimization dynamically for migratory b ... 
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Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed 
multiprocessors are three popular architectural techniques based on the execution of 
multiple, cooperating speculative threads. In these environments, correctly maintaining 
data dependences across threads requires mechanisms for disambiguating addresses 
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across threads, invalidating stale cache state, and making committed state visible. These 
mechanisms are both conceptually involved and hard to implement. In this paper, ... 

Results 1 - 20 of 111 Result page: 1 2 3 4 5 6 next 

The ACM Portal is published by the Association for Computing Machinery: Copyright © 2007 ACM, Inc. 

Terms of Usa ge Privac y Policy Code of Ethics Contact Us 

Useful downloads: Adobe Acrobat ^ QuickTime B Windows Media Player ^ Real Player 



http://portal.acm.org/results.cfm?CFID=l 2499062&CFTOKEN=3550801 4&adv=l &COLL... 3/1/2007 



